Optimal Best Arm Identification with Fixed Confidence
نویسندگان
چکیده
We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the ‘Track-and-Stop’ strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.
منابع مشابه
Best-Arm Identification in Linear Bandits
We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter θ and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In parti...
متن کاملTight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem
We consider the problem of best arm identification with a fixed budget T , in theK-armed stochastic bandit setting, with arms distribution defined on [0, 1]. We prove that any bandit strategy, for at least one bandit problem characterized by a complexityH , will misidentify the best arm with probability lower bounded by exp ( − T log(K)H ) , whereH is the sum for all sub-optimal arms of the inv...
متن کاملPure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence
We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearl...
متن کاملPure Exploration in Episodic Fixed-Horizon Markov Decision Processes
Multi-Armed Bandit (MAB) problems can be naturally extended to Markov Decision Processes (MDP). We extend the Best Arm Identification problem to episodic fixed-horizon MDPs. Here, the goal of an agent interacting with the MDP is to reach a high confidence on the optimal policy in as few episodes as possible. We propose Posterior Sampling for Pure Exploration (PSPE), a Bayesian algorithm for pur...
متن کاملBayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies
Pandemic influenza has the epidemic potential to kill millions of people. While various preventive measures exist (i.a., vaccination and school closures), deciding on strategies that lead to their most effective and efficient use, remains challenging. To this end, individual-based epidemiological models are essential to assist decision makers in determining the best strategy to curve epidemic s...
متن کامل